true

Introduction

This week we tidy the data and use a subset of data for exploratory analysis.

In load_and_clean_data.R file, we made following changes:

Our subset data o_MA is then saved in the dataset file.

Examining o_MA

library(tidyverse)
library(plotly)
o_MA <- read_csv(here::here("dataset/o_MA.csv"))

Plot 1

o_MA %>%
  group_by(d_state_name) %>%
  mutate(sum_n = sum(n)) %>%
  ungroup() %>%
  group_by(d_state_name,race) %>%
  mutate(state_race_pop = sum(n)) %>%
  summarize(sum_n=sum_n,state_race_pop=state_race_pop) %>%
  distinct(d_state_name,race,.keep_all = T) %>%
  filter(d_state_name != "Massachusetts") %>%
  ungroup() %>%
  arrange(desc(sum_n)) %>%
  filter(sum_n > 1650) %>% 
  ggplot() +
  geom_col(aes(x=reorder(d_state_name,sum_n),y=state_race_pop,fill=race))+
  theme(axis.text.x = element_text(angle=45,vjust=1,hjust=1)) +
  labs(title = "Population(16-26) Moved From MA to Other States",
       x = "Destination State", y = "Population From MA to Other States", color="race") + 
  theme(plot.title = element_text(hjust = 0.5))
## `summarise()` has grouped output by 'd_state_name', 'race'. You can override
## using the `.groups` argument.

We found out that the most popular state that people are moving to is New York, which is also the top states that has large populations. Moreover, we found out that within the moving population, the largest race group is white people. Except Georgia, the second largest race group is Hispanic.

Plot 2

o_MA2 <- read_csv(here::here("dataset/o_MA.csv")) %>%
  filter(d_state_name == "Massachusetts") %>%
  filter(o_cz_name != d_cz_name) %>%
  group_by(o_cz_name,d_cz_name) %>%
  summarize(mig = sum(n)) %>%
  ungroup() %>%
  pivot_wider(names_from=(o_cz_name),values_from=(mig)) %>%
  rename(OakBluffs='Oak Bluffs')
## Rows: 84576 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): o_cz_name, o_state_name, d_cz_name, d_state_name, race, income
## dbl (7): o_cz, d_cz, n, n_tot_o, n_tot_d, pr_d_o, pr_o_d
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## `summarise()` has grouped output by 'o_cz_name'. You can override using the `.groups` argument.
o_MA2 %>%  plot_ly(y=~d_cz_name,x=~Nantucket,type='bar',name='Nantucket') %>%
  add_trace(x=~OakBluffs,name='Oak Bluffs') %>%
  add_trace(x=~Pittsfield,name='Pittsfield') %>%
  add_trace(x=~Springfield,name='Springfield') %>%
  add_trace(x=~Boston,name='Boston') %>%
  layout(title = "Migration Patterns Withinin CZ in MA From Age 16 to 26",
         xaxis = list(title="Population From Other CZs in MA at 26"),
         yaxis = list(title="Destination State"),
         legend = list(title="State From"),barmode = 'stack'
         )
## Warning: Ignoring 1 observations

## Warning: Ignoring 1 observations

## Warning: Ignoring 1 observations

## Warning: Ignoring 1 observations

## Warning: Ignoring 1 observations

The number of people who resided in Springfield and the number of people who resided in Boston are relatively similar, and they their populations are the highest among all five community zone. The number of people in Pittsburgh is the second largest among all five community zone; the number of people in Oak Bluffs is the third largest among all five community zone; the number of people in Nantucket is the least largest in all community zone.

Plot 3

o_MA %>%
  filter(d_state_name != "Massachusetts") %>%
  group_by(d_state_name,race) %>%
  summarize(state_race_pop = sum(n)) %>%
  ggplot() +
  geom_col(aes(x=1,y=state_race_pop,fill=race),position = "fill") +
  facet_wrap(~d_state_name) +  
  coord_polar(theta = "y") +
  labs(title = "Migration Race Proportion By Destination States", x="",y="",fill="Race")+
  theme(plot.title = element_text(hjust = 0.5))
## `summarise()` has grouped output by 'd_state_name'. You can override using the
## `.groups` argument.

Among 50 different destination states, white group people are the most likely to migrate. Moreover, white group people make up at least 75 percent of the total migration in each state, except Alabama, Florida, and Georgia. Asian and colored groups have relatively similar migration patterns in most states.

Plot 4

o_MA %>%
  filter(d_state_name!="Massachusetts") %>%
  group_by(d_state_name,income) %>%
  summarize(state_inc_pop = sum(n)) %>%
  ggplot() +
  geom_col(aes(x=1,y=state_inc_pop,fill=income),position = "fill") +
  facet_wrap(~d_state_name) +  
  coord_polar(theta = "y")+
  labs(title="Migration Parental Income Quantile Proportion By Destination States",
       x = "", y="",fill="Income (Q1=poorest,Q5=richest)") + 
   theme(plot.title = element_text(hjust = 0.5))
## `summarise()` has grouped output by 'd_state_name'. You can override using the
## `.groups` argument.

Among 50 states’ migration population, the Q1 income group ( poorest parental income quintile) has the lowest migrant population in most cases. Surprisingly, the migration pattern in 10 of the states is very similar: people from 5 different income groups contribute a relatively equal number of the migrating population.